Module 01

Module 01 Portfolio Check

  • Installation check
    • Completion status:
    • Comments:
  • Portfolio repo setup
    • Completion status:
    • Comments:
  • RMarkdown Pretty PDF Challenge
    • Completion status:
    • Comments:
  • Evidence worksheet_01
    • Completion status:
    • Comments:
  • Evidence worksheet_02
    • Completion status:
    • Comments:
  • Evidence worksheet_03
    • Completion status:
    • Comments:
  • Problem Set_01
    • Completion status:
    • Comments:
  • Problem Set_02
    • Completion status:
    • Comments:
  • Writing assessment_01
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments:

Data Science Friday

Installation check

Use this space to include your installation screenshots.

Portfolio repo setup

Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your porfolio throughout the course.

$ mkdir MICB425_portfolio #make portfolio directory within desired directory
$ cd MICB425_portfolio #go to new directory
$ git init #designate it as a repo
$ touch ID.txt #create blank ID.txt file
$ git add . #stage all files in new repo for commit
$ git commit -m "First commit" #commit files
$ git remote add origin https://github.com/ryankn/MICB425_portfolio #designate remote repo URL
$ git remove -v #verify remote repo URL
$ git push -u origin master #push local repo to remote repo

RMarkdown pretty PDF challenge

The following is from the activity of recreating the example PDF, with the header levels changed such that they won’t appear in the table of contents.

R Markdown PDF Challenge

The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.

http://phdcomics.com/ Comic posted 1-17-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)

hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown

Here’s a header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:

1231521+12341556280987
## [1] 1.234156e+13
Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table!
I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!

Saanich Metadata Exercises

R code from work for Data Science Friday on 26 Jan 18.

#Libraries
#install.packages("tidyverse")
library("tidyverse")
## -- Attaching packages ------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.2.0
## v readr   1.1.1     v forcats 0.2.0
## -- Conflicts ---------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#Data Import
metadata <- read.table(file="DS_Friday/26Jan18/Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")

#Exercise 1
OTU <- read.table(file="DS_Friday/26Jan18/Saanich.OTU.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")

#Exercise 2
metadata %>% rownames_to_column('sample') %>% 
  filter(CH4_nM >= 100 & Temperature_C <= 10) %>% 
  column_to_rownames('sample') %>% 
  select(Depth_m,CH4_nM,Temperature_C)
##              Depth_m  CH4_nM Temperature_C
## SI072_S3_185     185 310.068         9.091
## SI072_S3_200     200 774.034         9.117
newtable <- 
  metadata %>% rownames_to_column('sample') %>% 
  select(matches("nM|sample")) %>% 
  mutate(N2O_uM = N2O_nM/1000, Std_N2O_uM = Std_N2O_nM/1000, CH4_uM = CH4_nM/1000, Std_CH4_uM = Std_CH4_nM/1000) %>% 
  column_to_rownames('sample')

#Exercise 3
metadata %>% rownames_to_column('sample') %>% 
  select(matches("nM|sample")) %>% 
  mutate(N2O_uM = N2O_nM/1000, Std_N2O_uM = Std_N2O_nM/1000, CH4_uM = CH4_nM/1000, Std_CH4_uM = Std_CH4_nM/1000) %>% 
  column_to_rownames('sample') 
##              N2O_nM Std_N2O_nM   CH4_nM Std_CH4_nM   N2O_uM Std_N2O_uM
## SI072_S3_010  0.849      0.114 1030.478      3.070 0.000849   0.000114
## SI072_S3_020 13.199      0.000   29.012      0.000 0.013199   0.000000
## SI072_S3_040 12.829      1.509   37.146      2.695 0.012829   0.001509
## SI072_S3_060 12.306      0.524   36.501      3.521 0.012306   0.000524
## SI072_S3_075 13.896      1.417   24.013      0.435 0.013896   0.001417
## SI072_S3_085 12.959      0.955    7.376      0.029 0.012959   0.000955
## SI072_S3_090 15.551      1.417    4.190      0.159 0.015551   0.001417
## SI072_S3_097 18.682      1.628    3.991      0.759 0.018682   0.001628
## SI072_S3_100 18.087      1.275    3.231      0.392 0.018087   0.001275
## SI072_S3_110 15.843      1.953    3.633      0.127 0.015843   0.001953
## SI072_S3_120 16.304      1.085    3.463      0.519 0.016304   0.001085
## SI072_S3_135 12.909      2.577    4.815      0.658 0.012909   0.002577
## SI072_S3_150 11.815      0.000    8.323      0.000 0.011815   0.000000
## SI072_S3_165  6.310      0.732   23.831      2.291 0.006310   0.000732
## SI072_S3_185  0.000      0.000  310.068      0.000 0.000000   0.000000
## SI072_S3_200  0.000      0.000  774.034     12.745 0.000000   0.000000
##                CH4_uM Std_CH4_uM
## SI072_S3_010 1.030478   0.003070
## SI072_S3_020 0.029012   0.000000
## SI072_S3_040 0.037146   0.002695
## SI072_S3_060 0.036501   0.003521
## SI072_S3_075 0.024013   0.000435
## SI072_S3_085 0.007376   0.000029
## SI072_S3_090 0.004190   0.000159
## SI072_S3_097 0.003991   0.000759
## SI072_S3_100 0.003231   0.000392
## SI072_S3_110 0.003633   0.000127
## SI072_S3_120 0.003463   0.000519
## SI072_S3_135 0.004815   0.000658
## SI072_S3_150 0.008323   0.000000
## SI072_S3_165 0.023831   0.002291
## SI072_S3_185 0.310068   0.000000
## SI072_S3_200 0.774034   0.012745

Data Science Friday Assignment Feb 16

R code for Data Science Friday assignment due Friday 16 Feb 18.

#Package Installation
#install.packages("tidyverse")
#source("https://bioconductor.org/biocLite.R")
#biocLite("phyloseq")

#Libraries
library("tidyverse")
library("phyloseq")

#Data Import
new_OTUs <- 
  read.table("DS_Friday/Assignment20180208/Saanich.OTU.new.txt",
             header = TRUE, sep = "\t", row.names = 1, na.strings = "NAN")
new_metadata <- 
  read.table("DS_Friday/Assignment20180208/Saanich.metadata.new.txt",
             header = TRUE, sep = "\t", row.names = 1, na.strings = "NAN")
load("DS_Friday/Assignment20180208/phyloseq_object.RData") 

#Exercise 1
ggplot(new_metadata, aes(x = CH4_nM, y = Depth_m)) +
  geom_point(color = "purple", shape = 17)

#Exercise 2
new_metadata %>%
  mutate(Temperature_F = Temperature_C * 9 / 5 + 32) %>%
  ggplot(aes(x = Temperature_F, y = Depth_m)) +
  geom_point()

#Exercise 3
physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
plot_bar(physeq_percent, fill="Domain") + 
  geom_bar(aes(fill=Domain), stat="identity") +
  labs(x = "Sample depth", y = "Relative abundance (%)", title = "Domains from 10 to 200 m in Saanich Inlet")

#Exercise 4
new_metadata %>%
  select(matches("uM|depth"),-matches("Std"),-H2S_uM) %>%
  gather(key = "Nutrient", value = "Concentration", -Depth_m) %>%
  ggplot(., aes(x = Depth_m, y = Concentration)) +
  geom_point() +
  geom_line() +
  facet_wrap( ~ Nutrient, scales = "free") +
  theme(legend.position = "none") +
  labs(x = "Depth (m)", y = expression(paste("Concentration (", mu, "M)")))

Origins and Earth Systems

Evidence worksheet 01

The first thing for any assignment should link(s) to any relevant literature (which should be included as full citations in a module references section below).

Whitman et al. 1998

Learning objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?
    What is the total number of prokaryotes and the total amount of their cellular carbon on earth?

turnover time? N/P/C

  • What were the primary methodological approaches used?
    To make calculation of such figures more plausible, the number of prokaryotes in three large habitats in which current knowledge suggests most prokaryotes reside in were examined, namely: aquatic environments, soil, and the subsurface. All numbers were used from previously published papers reporting various figures like CFU/mL counts, volume estimations, or C content.

  • Summarize the main results or findings.
    The amount of prokaryotic N, P, and C is roughly 60-100% of the amount in plants.

Analysis of the subsurface prokaryotic community suggests the turnover time is extremely long, on the order of [].

  • Do new questions arise from the results?
    How is the turnover time for the subsurface community so long? At that kind of turnover rate, can that still be constituted as life?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    Terminology, what the units actually mean, was hard

Problem set 01

Learning objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.
Habitat Abundance
Aquatic 1.161 x 1029
Soil 2.556 x 1929
Subsurface 3.8 x 1030
  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?
    3.6x1028 cells, at 5x105 cells/mL of Cyanobacteria is 8%

4x104 cells/mL divided by 5x105 cells/mL = 8%

  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?
    autotroph fix inorganic carbon e.g. CO2 into biomass, heterotroph assimilate organic carbon, lithotroph consumes inorganic substrates

  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?
    subsurface deep habitats, both terrestrial and marine terrestrial and marine: up to 4 km, limiting factor is temperature of 125 degrees C temperature changes about 22 C per km

marianas trench - how deep is it? 10.9 km

  • Based on information provided in the text and your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?

mount everest - 8.8 km is anything really alive up in the atmosphere at 77 km? that doesn’t seem likely - lack of nutrients or moisture, then there’s lots of UV radiation too, sketchy. Let’s say 20 km.

  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
    Thus the vertical distance is about 24 km from top to bottom (tip of mount everest to 4-5 km under marianas trench)

  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
    Annual cellular production of prokaryotes was calculated based on literature values for population size and population turnover time in days. In the following example calculation, population size is P, turnover time is T, and annual cellular production is A.

\[A=P*\frac{365}{T}\] 3.6x1028 cells * 365 days / 16 turnovers = 8.2x1029 cells/year

  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?
    assuming carbon efficiency is 20%, so net productivity required is 4x (why not 5? confusion) assume C content per cell of say 10 fg/cell, which is around 20x10-30 Pg/cell multiply # of cells (3.6x1028 cells) by C content per cell, which comes up to about 0.72 Pg C in marine heterotrophs

0.72*4 = 2.88 Pg C per year

51 Pg C per year of productivity * 85% = 43 Pg C per year goes to upper 200m 43/2.88 = 14.9 turnovers a year 365/14.9 = 24.5 days per turnover

why does this vary with depth? different production and consumption of C in different habitats

Carbon assimilation efficiency and carbon content determine turnover rates in the upper 200m of the ocean. The amount of net primary productivity required to sustain prokaryotic turnover is dependent on both C assimilation efficiency and total carbon content of the population, which then sets an upper limit on turnover rates. These vary between habitats because different assimilation efficiencies and total carbon content, as well as the amount of total net primary productivity each habitat zone consumes.

also viruses - the viruses kill bugs causing turnover, and carry assessory metabolic genes that when they infect cells, supplement the various metabolic capacities of the community

  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)

4x10-7 mutations/generation

(4x10-7)4 = 2.56 x 10-26 mutations/generation

365/16 = 22.8 turnovers per year

3.1 x 1028 cells * 22.8 = 8.2x1029 cells/year

8.2x1029 cells/year x 2.56 x 10-26 mutations/generation = 2.1x104 mutations/year

convert to hours - divide by 365x24

2.1x104 / 365 / 24 = 2.4 mutations/hour

1/2.4 = 0.4 hours/mutation

  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?

  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?

Evidence Worksheet 02 “Life and the Evolution of Earth’s Atmosphere”

Learning objectives:

Comment on the emergence of microbial life and the evolution of Earth systems.

Specific Questions:

  • Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.

    • 4.6 billion years ago
      Formation of the solar system and Earth. Currently, Earth is a large molten rock.
    • 4.5 billion years ago
      Formation of the moon, giving Earth spin, tilt, day/night cycles, seasons
    • 4.2 billion years ago
      Darwinian threshold
    • 4.1 billion years ago
      First evidence of life found in graphite within zircons
    • 3.8 billion years ago
      Oceans, prokaryotic cells, Rubisco, stable sea chemistry, halted meteorite bombardment, methanogenesis
    • 3.75 billion years ago
      Photosynthesis
    • 3.5 billion years ago
      Cyanobacteria, microbial life on land
    • 3.0 billion years ago
      First global glaciation event
    • 2.7 billion years ago
      Great oxidation event, red beds
    • 2.2 billion years ago
      First snowball earth (glaciation event)
    • 2.1 billion years ago Appearance of eukaryotes, multicellular life, endosymbiosis,
    • 1.3 billion years ago

    • 550,000 years ago Cambrian explosion, complex multicellular life, emergence of animals
    • 400,000 years ago Land plants emerge, followed by a period of gigantism because of high oxygen atmosphere
    • 200,000 years ago First humans

  • Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:

    • Hadean
      Dry hot surface of planet (~500 C), 90 bar pressure on Earth’s surface, CO2 sequestered in carbonate minerals such as limestone
    • Archean
      Strong greenhouse gases predominate
    • Precambrian

    • Proterozoic
      Oxidation of the early atmosphere - microaerobic
    • Phanerozoic
      Increased oxygenation of the atmosphere, mass extinction events from meteorite impacts

Problem set 02 “Microbial Engines”

Learning objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?
    Geophysical processes - tectonics, atmospheric photochemical processes

Biogeochemical processes - H, C, N, O, S, and P fluxes

abiotic chemical processes tend to be based on acid/base reactions while biotic ones are based on redox. Reactions are nested, with abiotic processes providing e- acceptors that the biotic reactions use, as well as C, S, and P via tectonics, volcanism and weathering(?)

  • Why is Earth’s redox state considered an emergent property? Emergent property of microbial life on earth

  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
    Synergistic multi-species assemblage of the overall pathway

  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
    NH4 -> NO2 is a niche, NO2 -> NO3 is a niche; nitrification, typically involves CO2 fixation to organic matter NH4 + NO2 -> N2 (anammox) N2 -> NH4 reduction (N fixation) NO2 or NO3 -> NO -> N2O -> N2 (denitrification) and N2O can be released (greenhouse gas) NO3 -> NO2 -> NH4 (Dissimilatory nitrate reduction to ammonium, DNRA)

  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?

  • On what basis do the authors consider microbes the guardians of metabolism?

Module 01 references

Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.

Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci U S A 95:6578-6583. PMC33863